AITopics | distillation method

Collaborating Authors

distillation method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Knowledge Gets Distilled in Knowledge Distillation? Utkarsh Ojha Yuheng Li Anirudh Sundara Rajan Yingyu Liang Yong Jae Lee University of Wisconsin-Madison

Neural Information Processing SystemsApr-25-2026, 22:47:11 GMT

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel techniques and use cases of knowledge distillation. Yet, despite the various improvements, there seems to be a glaring gap in the community's fundamental understanding of the process. Specifically, what is the knowledge that gets distilled in knowledge distillation? In other words, in what ways does the student become similar to the teacher?

artificial intelligence, machine learning, student, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.40)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

082a8bbf2c357c09f26675f9cf5bcba3-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 13:31:41 GMT

artificial intelligence, classification teacher, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distilling Image Classifiers in Object Detectors

Neural Information Processing SystemsApr-24-2026, 13:31:38 GMT

Knowledge distillation constitutes a simple yet effective way to improve the performance of a compact student network by exploiting the knowledge of a more powerful teacher. Nevertheless, the knowledge distillation literature remains limited to the scenario where the student and the teacher tackle the same task. Here, we investigate the problem of transferring knowledge not only across architectures but also across tasks. To this end, we study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework. In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance. Our experiments on several detectors with different backbones demonstrate the effectiveness of our approach, allowing us to outperform the state-of-the-art detector-to-detector distillation methods.

artificial intelligence, distillation, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Label is Worth A Thousand Images in Dataset Distillation

Neural Information Processing SystemsMar-22-2026, 20:10:30 GMT

Data is a crucial factor in the performance of machine learning models, a principle that dataset distillation methods exploit by compressing training datasets into much smaller counterparts that maintain similar downstream performance. Understanding how and why data distillation methods work is vital not only for improving these methods but also for revealing fundamental characteristics of good" training data. However, a major challenge in achieving this goal is the observation that distillation approaches, which rely on sophisticated but mostly disparate methods to generate synthetic data, have little in common with each other. In this work, we highlight a largely overlooked aspect common to most of these methods: the use of soft (probabilistic) labels. Through a series of ablation experiments, we study the role of soft labels in depth. Our results reveal that the main factor explaining the performance of state-of-the-art distillation methods is not the specific techniques used to generate synthetic data but rather the use of soft labels. Furthermore, we demonstrate that not all soft labels are created equal; they must contain to be beneficial. We also provide empirical scaling laws that characterize the effectiveness of soft labels as a function of images-per-class in the distilled dataset and establish an empirical Pareto frontier for data-efficient learning. Combined, our findings challenge conventional wisdom in dataset distillation, underscore the importance of soft labels in learning, and suggest new directions for improving distillation methods.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation

Neural Information Processing SystemsMar-21-2026, 05:05:37 GMT

Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler Divergence (KL-Div) has been predominant, and recently its variants have achieved compelling performance. However, KL-Div only compares probabilities of the corresponding category between the teacher and student while lacking a mechanism for cross-category comparison. Besides, KL-Div is problematic when applied to intermediate layers, as it cannot handle non-overlapping distributions and is unaware of geometry of the underlying manifold. To address these downsides, we propose a methodology of Wasserstein Distance (WD) based knowledge distillation. Specifically, we propose a logit distillation method called WKD-L based on discrete WD, which performs cross-category comparison of probabilities and thus can explicitly leverage rich interrelations among categories. Moreover, we introduce a feature distillation method called WKD-F, which uses a parametric method for modeling feature distributions and adopts continuous WD for transferring knowledge from intermediate layers. Comprehensive evaluations on image classification and object detection have shown (1) for logit distillation WKD-L outperforms very strong KL-Div variants; (2) for feature distillation WKD-F is superior to the KL-Div counterparts and state-of-the-art competitors.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

4ec0b6648bdf487a2f1c815924339022-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 03:15:02 GMT

In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effectofthefeatureprojector between thestudent andtheteacher remains underexplored.

artificial intelligence, machine learning, projector, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Label is Worth a Thousand Images in Dataset Distillation

Neural Information Processing SystemsFeb-18-2026, 15:13:13 GMT

Understanding how and why data distillation methods work is vital not only for improving these methods but also for revealing fundamental characteristics of "good" training data.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.69)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sequential Subset Matching for Dataset Distillation

Neural Information Processing SystemsFeb-17-2026, 08:09:38 GMT

The synthetic datasets are expected to capture the essence of the knowledge contained in real-world datasets such that the former yields a similar performance as the latter.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.04)
Africa > Togo (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

a12779b5e802668df1cbc73fa00da62f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 06:01:43 GMT

artificial intelligence, machine learning, semantic segmentation, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.05)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Improving the Training of Rectified Flows

Neural Information Processing SystemsFeb-15-2026, 20:42:39 GMT

One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country: